Variable Selection for Discrimination of More Than Two Classes Where Data are Sparse
نویسندگان
چکیده
In classification, with an increasing number of variables, the required number of observations grows drastically. In this paper we present an approach to put into effect the maximal possible variable selection, by splitting a K class classification problem into pairwise problems. The principle makes use of the possibility that a variable that discriminates two classes will not necessarily do so for all such class pairs. We further present the construction of a classification rule based on the pairwise solutions by the Pairwise Coupling algorithm according to Hastie and Tibshirani (1998). The suggested proceedure can be applied to any classification method. Finally, situations with lack of data in multidimensional spaces are investigated on different simulated data sets to illustrate the problem and the possible gain. The principle is compared to the classical approach of linear and quadratic discriminant analysis. 1 Motivation and idea In most classification procedures, the number of unknown parameters grows more than linearly with dimension of the data. It may be desirable to apply a method of variable selection for a meaningful reduction of the set of used variables for the classification problem. In this paper an idea is presented as to how to maximally reduce the number of used variables in the classification rule in a manner of partial variable selection. To motivate this, consider the example of 5 classes distributed in a variable as it is shown in figure 1. It will be hardly possible to discriminate e.g. whether an observation is of class 1 or 2. An object of class 5 instead will probably be well recognized. The following matrix (rows and coloumns denoting the classes)
منابع مشابه
Hyperspectral Image Classification Based on the Fusion of the Features Generated by Sparse Representation Methods, Linear and Non-linear Transformations
The ability of recording the high resolution spectral signature of earth surface would be the most important feature of hyperspectral sensors. On the other hand, classification of hyperspectral imagery is known as one of the methods to extracting information from these remote sensing data sources. Despite the high potential of hyperspectral images in the information content point of view, there...
متن کاملA Total Ratio of Vegetation Index (TRVI) for Shrubs Sparse Cover Delineating in Open Woodland
Persian juniper and Pistachio are grown in low density in the rangelands of North-East of Iran. These rangelands are populated by evergreen conifers, which are widespread and present at low-density and sparse shrub of pistachio in Iran, that are not only environmentally but also genetically essential as seed sources for pistachio improvement in orchards. Rangelands offer excellent opportunities...
متن کاملFace Recognition using an Affine Sparse Coding approach
Sparse coding is an unsupervised method which learns a set of over-complete bases to represent data such as image and video. Sparse coding has increasing attraction for image classification applications in recent years. But in the cases where we have some similar images from different classes, such as face recognition applications, different images may be classified into the same class, and hen...
متن کاملImage Classification via Sparse Representation and Subspace Alignment
Image representation is a crucial problem in image processing where there exist many low-level representations of image, i.e., SIFT, HOG and so on. But there is a missing link across low-level and high-level semantic representations. In fact, traditional machine learning approaches, e.g., non-negative matrix factorization, sparse representation and principle component analysis are employed to d...
متن کاملA Sparse Representation-Based Algorithm for Pattern Localization in Brain Imaging Data Analysis
Considering the two-class classification problem in brain imaging data analysis, we propose a sparse representation-based multi-variate pattern analysis (MVPA) algorithm to localize brain activation patterns corresponding to different stimulus classes/brain states respectively. Feature selection can be modeled as a sparse representation (or sparse regression) problem. Such technique has been su...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005